Statistical Applications in Genetics and Molecular Biology
نویسندگان
چکیده
We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutation-based techniques for sampling under the null distribution. In some situations, however, a full parametric model cannot be justified, or the sample size per group is too small for permutation methods to be valid. We propose a semi-parametric framework based on partial mixture estimation which only requires a parametric assumption for the null (equally expressed) distribution and can handle small sample sizes where permutation methods break down. We develop two novel improvements of Scott’s minimum integrated square error criterion for partial mixture estimation [Scott, 2004a,b]. As a side benefit, we obtain interpretable and closed-form estimates for the proportion of EE genes. Pseudo-Bayesian and frequentist procedures for controlling the false discovery rate are given. Results from simulations and real datasets indicate that our approach can provide substantial advantages for small sample sizes over the SAMmethod of Tusher et al. [2001], the empirical Bayes procedure of Efron and Tibshirani [2002], the mixture of normals of Pan et al. [2003] and a t-test with p-value adjustment [Dudoit et al., 2003] to control the FDR [Benjamini and Hochberg, 1995].
منابع مشابه
Strategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملSLC2A4 Polymorphisms Can Be a New Molecular Biomarker for Sports Genomics
"SLC2A4 Polymorphisms Can Be a New Molecular Biomarker for Sports Genomics" is an "Editorial Article" and hasn't abstract.
متن کاملStatistical Applications in Genetics and Molecular Biology
This note is a comment on the article “Dimension Reduction for Classification with Gene Expression Microarray Data” that appeared in Statistical Applications in Genetics and Molecular Biology (Dai et al., 2006).
متن کاملExpression Analysis of PKS13, FG08079.1 and PKS10 Genes in Fusarium graminearum and Fusarium culmorum
Background: Identification and quantification of mycotoxins produced by Fusarium species are important in controlling fungal diseases. Objectives: Potential of zearalenone, butenolide and fusarin C production was investigated in five Fusarium graminearum and five F. culmorum isolates at molecular level. Materials and Methods: Presence of PKS13, FG08079.1 and PKS10 genes, associated with produ...
متن کاملMolecular Epidemiology of Breast Cancer among Iranian-Azeri Population based on P53 Research
Background: This study was done in order to enhance our understanding about molecular and epidemiological features of breast cancer among the Azeri population with special emphasis on the detection of TP53 mutations. We also analyzed the role of the P53codon72 polymorphism (rs1042522) and its role in susceptibility to breast cancer. Methods: ...
متن کامل